Fix reset_index crash when obs index name matches existing column#1100
Fix reset_index crash when obs index name matches existing column#1100timtreis wants to merge 3 commits into
Conversation
for more information, see https://pre-commit.ci
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #1100 +/- ##
=======================================
Coverage 91.93% 91.93%
=======================================
Files 51 51
Lines 7772 7776 +4
=======================================
+ Hits 7145 7149 +4
Misses 627 627
🚀 New features to boost your workflow:
|
| name collides with an existing column (e.g. "EntityID" in Merfish data). | ||
| """ | ||
| if df.index.name is not None and df.index.name in df.columns: | ||
| return df.reset_index(drop=True) |
There was a problem hiding this comment.
So here we are dropping the index instead of a column. This could be a problem if the column is different from the index.
What's the origin behind this problem? Is it that for the merfish data we should have had sdata['table'].index.name = None? That would be a better fix, while still raising an exception here.
|
|
||
| @pytest.mark.parametrize("how", ["inner", "left"]) | ||
| def test_join_spatialelement_table_obs_index_name_collision(how): | ||
| """join_spatialelement_table must not crash when obs index name matches an existing column. |
There was a problem hiding this comment.
I think it should crash in general to avoid a silent branching in the behavior. The ok case would be if both the index and the column have the same values.
|
Since the code path triggering this has been addressed in If someone wants to iterate on this, feel free to reopen (please in doing so address the review comments I added). |
Summary
_inner_join_spatialelement_tableand_left_join_spatialelement_tablecalltable.obs.reset_index()which raisesValueError: cannot insert <name>, already existswhen the obs index name matches an existing column (e.g.EntityIDin Merfish data)_reset_index_preserving_existing_columnsthat drops the index when its name already exists as a column, avoiding the collision